DiALog: A Distributed Model for Capturing Provenance and Auditing Information
نویسندگان
چکیده
Service-oriented systems facilitate business workflows to span multiple organizations (e.g., by means of Web services). As a side effect, data may be more easily transferred over organizational boundaries. Thus, privacy issues arise. At the same time, there are personal, business and legal requirements for protecting privacy and IPR and allowing customers to request information about how and by whom their data was handled. Managing these requirements constitutes an unsolved technical and organizational problem. The authors propose to solve the information request problem by attaching meta-knowledge about how data was handled to the data itself. The authors present their solution, in form of an architecture, a formalization and an implemented prototype for logging and collecting logs in service-oriented and cross-organizational systems. DOI: 10.4018/jwsr.2010040101 2 International Journal of Web Services Research, 7(2), 1-20, April-June 2010 Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. tion about the processing and whereabouts of his data. The answer must contain details defined by the contract or law (e.g., who processed the data as well as why and how the data has been processed). The answer can be generated in different ways, e.g., by modeling and observing the distributed data processing. However, the answer can only be generated, if the model and the observation facilitate a detailed overview of the processing. Most frequently such an overview is lacking, even for internal workflows and data storage. Hence, we require a model of the distributed processing of data in service-oriented systems in combination with a distributed mechanism for logging in service-oriented systems to collect the needed information and answer the request. Existing logging mechanisms, like the Extended Log File Format (HallamBaker et al., 1996) or syslog (Lonvick, 2001), are not sufficient to gain a full overview of a workflow that is distributed among multiple organizations, because they perform logging only in one execution environment. Because of the diversity of execution environments and because of a lack of standardized interfaces for exchanging logs, aggregating distributed logs remains a challenge. In the following we present DIALOG (DIstributed Auditing LOGs) and sticky logging. DIALOG is a method for auditing the distributed processing of data in service-oriented systems. Sticky logging monitors the processing of data items (independent of the actual business process) attaching the logs directly to the processed data as metadata. Furthermore, sticky logging allows for the reconstruction of how the data was processed by whom and why following the specification of DIALOG. Thus, sticky logging is a generic middleware for distributed logging. The paper is organized as follows: First, we present a scenario and analyze requirements for collecting information about the processing of private data in service-oriented systems. Following the requirements, we discuss various models for distributed processing of data. Then we introduce DIALOG and define notions of soundness and completeness relevant for the auditing in distributed systems. Based on DIALOG we present the architecture and a prototype1 implementation of sticky logging. Before we eventually discuss our approach and conclude, we compare it with related work.
منابع مشابه
Transparent Web Service Auditing via Network Provenance Functions
Detecting and explaining the nature of attacks in distributed web services is often difficult – determining the nature of suspicious activity requires following the trail of an attacker through a chain of heterogeneous software components including load balancers, proxies, worker nodes, and storage services. Unfortunately, existing forensic solutions cannot provide the necessary context to link...
متن کاملSPADE: Support for Provenance Auditing in Distributed Environments
SPADE is an open source software infrastructure for data provenance collection and management. The underlying data model used throughout the system is graph-based, consisting of vertices and directed edges that are modeled after the node and relationship types described in the Open Provenance Model. The system has been designed to decouple the collection, storage, and querying of provenance met...
متن کاملSketching Distributed Data Provenance
Users can determine the precise origins of their data by collecting detailed provenance records. However, auditing at a finer grain produces large amounts of metadata. To efficiently manage the collected provenance, several provenance management systems, including SPADE, record provenance on the hosts where it is generated. Distributed provenance raises the issue of efficient reconstruction dur...
متن کاملBig Data Provenance: Challenges and Implications for Benchmarking
Data Provenance is information about the origin and creation process of data. Such information is useful for debugging data and transformations, auditing, evaluating the quality of and trust in data, modelling authenticity, and implementing access control for derived data. Provenance has been studied by the database, workflow, and distributed systems communities, but provenance for Big Data whi...
متن کاملUnified Platform for Secure Networked Information Systems
In this paper, we present a unified declarative platform for specifying, implementing, analyzing and auditing large-scale secure information systems. Our proposed system builds upon techniques from logic-based trust management systems, declarative networking, and data analysis via provenance. First, we propose the Secure Network Datalog (SeNDlog) language that unifies Binder, a logic-based lang...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Web Service Res.
دوره 7 شماره
صفحات -
تاریخ انتشار 2010